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6 RELATED APPLICATIONS 

7 This application is a continuation-in-part of U.S. patent application serial no. 

8 08/639,128 filed April 26, 1996, which is a continuation of serial ho. 08/193,707 filed 

9 February 2, 1994, which is a continuation of serial no. 07/820,364 filed January 14, 
10 1992, now U.S. patent 5,313,421. 

11 

12 FIELD OF THE INVENTION 

13 This invention pertains to semiconductor memory devices and particularly to 

14 multi-state memories. 
15 

16 BACKGROUND OF THE INVENTION 

17 As is well known, in a semiconductor memory cell, data is stored by 

18 programming the cell to have a desired threshold voltage. Simple memory cells store 

19 one of two states, a logical one or a logical zero, in which case the cell is 

20 programmed to either turn on or not turn on, respectively, when read conditions are 

21 established, thereby allowing the read operation to determine if a logical one or a 

22 logical zero has been stored in the memory cell. More sophisticated semiconductor 

23 memory cells allow the storage of one of a plurality of memory states greater than 

24 two, by providing the ability to store a variety of threshold voltages in the memory 

25 cell, each threshold voltage being associated with one of a plurality greater than two 

26 logical states. Such multi-state memory cells and arrays are described, for example 

27 in U.S. patents 5,043,940 and 5,434,825 issued on inventions of Dr. Eliyahou Harari. 

28 In order to fully exploit the concept of high density multi-state memory 

29 devices, the memory states must be packed as closely together as possible, with 

30 minimal threshold separation for margin/discrimination overhead. Factors which 

31 dictate this overhead are noise, drift (particularly random as opposed to common 

32 mode), sensing speed (deltaT = C*deltaV/I), and safety margin guard bands, as well 
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1 as precision and stability of reference sources/sense circuits. This overhead must be 

2 added to the memory state width associated with precision of writing the memory cells 

3 (again with respect to the reference sources). With a closed loop write, in which a 

4 write is performed followed by a verify operation and in which cells which fail the 

5 verify operation are rewritten, the relative precision of memory cell to reference 
.6 source can be made arbitrarily high by expending more time in writing. State packing 

7 will then be dictated more by how precise and stable the various storage sense points 

8 can be separated from one another, a property of both memory state stability and how 

9 reference points/elements are established. 
10 

11 SUMMARY 

12 Maximized multi-state compaction and more tolerance in memory state 

13 behavior is achieved through a flexible, self-consistent and self-adapting mode of 

14 detection, covering a wide dynamic range. For high density multi-state encoding, this 

15 approach borders on full analog treatment, dictating analog techniques including A to 

16 D type conversion to reconstruct and process the data. In accordance with the 

17 teachings of this invention, the memory array is read with high fidelity, not to provide 

18 actual final digital data, but rather to provide raw data accurately reflecting the analog 

19 storage state, which information is sent to a memory controller for analysis and 

20 detection of the actual final digital data. 

21 One goal of the present invention is to provide self-consistent, adaptive and 

22 tracking capability for sensing, capable of establishing both the data and the "quality" 

23 of the data (i.e. the margins). In accordance with certain embodiments of this 

24 invention, tracking cells are included within each of the sectors. These tracking cells 

25 are set at known states to reliably establish the optimum discrimination points for each 

26 of the various states. In certain embodiments, this is accomplished using as few as 

27 one cell per state. However, if better statistics are vital to establishing the optimum 

28 discrimination point, a small population of cells sufficient to establish such optimum 

29 points statistically is used. Data from these tracking cells will be the first information 

30 from the sector to be read into the controller, in order* to establish the optimum 

31 discrimination points for the remainder of the sector data. In order to make these 

32 cells track the rest of the sectors in terms of data history and wear, they are subjected 
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1 to the same logical to physical data state translation (rotation) writing as used for their 

2 associated sectors. 

3 In accordance with various alternative embodiments of this invention, high 

4 density multi-state memories are taught which include parallel, full chunk, A/D 

5 conversion of multi-state data, with adequate resolution to provide analog measure of 

6 the encoded states; master reference cell(s) whose prime function is to provide 

7 optimum dynamic range for comparator sensing; Logical to Physical Data scrambling 

8 to provide both intra-sector wear leveling and increased endurance capability; and 

9 intra-sector tracking cell groups, one for each state, included in each sector to provide 

10 optimum compare points for the various states, and able to adapt to any common 

11 mode shifts (e.g. detrapping). In accordance with certain embodiments, a controller 

12 incorporates a data processing "engine" to, on-the-fly, find midpoints of each tracking 

13 cell group. The controller also establishes data state discrimination and marginality 

14 filter points. Sector data is passed through the controller, giving both the encoded 

15 memory state, and its quality (marginality), for each physical bit. If desired, the 

16 controller decides what actions must be taken to clean up (scrub) marginal bit data 

17 based on the quality information (e.g. do full sector erase and rewrite versus selective 

18 write, only). Also, if desired, the invention includes a small counter on each sector 

19 which is incremented each time a read scrub is encountered. When the count reaches 

20 maximum allowed, marginal bit(s) are mapped out rather than rewritten and counter 

21 is reset to 0. This provides a filter for truly "bad" bits. Similar features are applied 

22 in reverse to write multi-state data back into a sector, using the same circuitry as used 

23 for read but operated in reverse, to provide self-consistent data encoding. In addition, 

24 two alternative embodiments for performing verification are taught: using a reference 

25 current staircase to sequentially scan through the range of states, conditionally 

26 terminating each cell as the current step corresponding to its target data is presented 

27 to the sensing circuit; and using a full set of N-l reference currents of the N possible 

28 states to simultaneously verify and conditionally terminate all cells. In certain 

29 embodiments, a twin-cell option is included in each sector to provide deltaVt shift 

30 level associated with cycling driven trapping and channel .wearout, triggering sector 

31 retirement before detrapping shifts exceed read dynamic range or other potential read 

32 errors. This replaces hot count based sector retirement, greatly increasing usable 
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1 endurance. 

2 As another feature of certain embodiments of this invention, a cell-by-cell 

3 column oriented steering approach, realizable in two source side injection cell 

4 embodiments, increases the performance of high level multi-state significantly, 

5 improving both its write and read speed. It achieves this by applying, in parallel, 

6 custom steering conditions needed for the particular state of each cell. This offers 

7 substantial reduction in the number of individual programming steps needed for write, 

8 and permits powerful binary search methodology for read, without having to carry out 

9 full sequential search operations. Improved performance is further bolstered through 

10 increased chunk size, made possible via the low current source-side injection 

11 mechanism, which allows every fourth floating gate element to be operated on, 

12 thereby increasing chunk size. 
13 

14 BRIEF DESCRIPTION OF THE DRAWINGS 

15 Figure la is a schematic representation of one embodiment of this invention 

16 which utilizes dynamic sensing of the selected memory cell. 

17 Figure lb is a graph depicting the voltages associated with sensing the state of 

18 the memory cell of the embodiment of Figure la; 

19 Figure 2 is a block diagram depicting one embodiment of this invention in 

20 which trip times associated with reading a plurality of cells are converted to binary 

21 code; 

22 Figure 3 is an alternative embodiment of this invention which uses a static 

23 sensing approach utilizing current comparators; 

24 Figure 4a is a diagram depicting exemplary state ranges and counter/A/D 

25 resolution for 4-level multi-state encoding; 

26 Figure 4b is a diagram depicting exemplary state ranges and counter/A/D 

27 resolution for 8-level multi-state encoding; 

28 Figure 5 is a flow-chart depicting the operation of one embodiment of this 

29 invention; 

30 Figure 6 is a bit map depicting user data and overhead data associated with one 

31 embodiment of the present invention; 

32 Figure 7 is a flowchart depicting m more detail one embodiment of the step 
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1 of processing tracking cell data in Figure 5; 

2 Figure 8 is a block diagram depicting programming and verification elements 

3 suitable for use in the embodiment of Figure 3; 

4 Figure 9 is a flowchart depicting the operation of one embodiment of this 

5 invention as depicted in Figure 8; 

6 Figure 10, composed of Figures 10a and 10b, is a flowchart depicting an 

7 alternative embodiment of this invention suitable for use in connection with the 

8 embodiment of Figure 8; 

9 Figure 11 is an alternative embodiment of this invention which allows for 

10 improved verify processing; 

1 1 Figure 12 is a diagram depicting one embodiment of a twin-cell of the present 

12 invention; 

13 Figure 13 is a diagram depicting one embodiment of a cell suitable for use in 

14 connection with certain embodiments of this invention; 

15 Figure 14 is a diagram depicting one embodiment of the cell-read operation of 

16 this invention using the cell embodiment of Figure 13; 

17 Figure 15 is a flowchart illustrating one embodiment of this invention with 

18 reference to the embodiment to Figure 14; 

19 Figure 16 is a diagram depicting an alternative embodiment of this invention 

20 in which sensing is performed on a plurality of bits simultaneously as could be used 

21 in conjunction with the embodiment of Figure 14; 

22 Figure 17 is a diagram depicting one embodiment of this invention in which 

23 common elements are used for both reading and multi-state programming; 

24 Figure 18 is an alternative embodiment of this invention in which certain 

25 control elements are replicated, one set used for programming and the other for 

26 read/verify operations; 

27 Figure 19 is a diagram depicting one embodiment of an array suitable for use 

28 in accordance with the teachings of this invention; 

29 Figure 20 is a diagram depicting an alternative array suitable for use in 

30 conjunction with the present invention; and 

31 Figure 21 is a graph depicting the distribution of erased cell levels in 

32 accordance with certain embodiments of this invention. 
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1 DETAILED DESCRIPTION 

2 A/D Sensing 

3 A first step in this invention is acquiring the fiill analog value of the memory 

4 state (e.g. the actual cell current, which in turn reflects the actual stored floating gate 

5 voltage VFG). The following describes two alternative embodiments for rapidly 

6 sensing and converting, to digital form, data stored in a large number of physical cells 

7 (e.g. a chunk of 256 cells) simultaneously, each cell capable of storing a large number 

8 of multi-states (e.g. four states or more), and sensing capable of spanning a wide 

9 dynamic range. The basis underlying both of these embodiments is the analog 

10 property of the memory cell, wherein its current drive capability is in proportion to 

11 its stored floating gate charge (voltage). Consequently, each memory state is 

12 characterized by its current drive capability (in actuality a narrow range of current 

13 drives, including margin capability). Therefore sensing and discriminating the various 

14 states comes down to differentiating between the various drive level ranges. Two 

15 exemplary embodiments are now described for achieving this differentiation. 

16 A first embodiment is described with reference to Figures la and lb, and 

17 involves dynamic-type sensing, wherein the bit lines (such as bit line 101) of the 

18 selected memory cells (such as cell 102) are precharged (e.g. to 2.5v), and then the 

19 row (e.g. word line 103) of the selected cells is turned on, preferably using a 

20 controlled ramp (e.g. 5usec rise time) or a stepped staircase (for example over 5usec), 

21 allowing the respective bit lines to discharge through the selected memory cells at 

22 rates proportional to their current driving capability. When the bit lines discharge to 

23 a predetermined voltage (e.g. lv), they flip a corresponding sense amplifier (e.g. 

24 sense amplifier 104), indicating sense achieved. The time taken to flip the sense 

25 amplifier from the start of sensing is an analog measure of the cell drive: the longer 

26 the time, the lower the drive (i.e. cell is more programmed, having more negative 

27 charge on the floating gate as depicted in Figure lb). 

28 Table 1 is an example of sense amplifier trip time to cell current drive 

29 capability based on simulation using floating gate cell I-V data. 

30 , 
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1 v Table 1 

2 ICELL(uAmps) 20 30 40 50 60 70 80 90 100 

3 Trip time (usee) 5.4 4.9 4.7 4.4 4.2 3.9 3.7 3.5 3.4 
4 

5 In the example of Table 1 , bit line 101 is precharged to 5v and tripped at 2.5v, 

6 load capacitance is 1.25pF and control gate rate of increase is 1.25 v/usec, ramped 

7 to 7v in a staircase fashion. Because of disturbs, it is undesirable to expose the 

8 memory cell drain to more than 2v. Therefore the 5v precharge is, in one 

9 embodiment, applied to sense capacitor 105 isolated from the memory cell drain, and 

10 the drain is only allowed to charge to a lower voltage (e.g. 1.5v). With column 

11 segmentation this drain voltage lowering is, in one embodiment, done locally, using 

12 a segment select transistor to limit the voltage transferred from a global bit line to the 

13 local bit line, such as is described in copending U.S. Patent 5,315,541 assigned to 

14 Sandisk Corporation. 

15 In one embodiment, the trip times are converted en masse to a binary code 

16 using an A/D approach, as shown in Figure 2. Time is metered using clock 205 

17 which increments master counter 204 which in the example shown here is an 8 bit 

18 counter. Counter 204 drives lines 209 (8 lines in this example) which feed into 

19 registers 201-1 through 201-N via transfer gates 202-1 through 202-N, respectively, 

20 with one register for each cell being sensed (e.g. 256, 8-bit registers for a 256 bit 

21 memory chunk size). At the start of sensing, counter 204 is initialized to zero, and 

22 then starts counting up, with the registers reflecting the count. 

23 At the point of a cell sensing (i.e. at the sense amplifier trip time), the 

24 corresponding sense amplifier flips, which isolates the corresponding register from 

25 counter 204, thereby freezing the time (and its associated binary code) in that register. 

26 In this way, each register contains a binary representation of the analog storage level 

27 of the memory cell to the resolution of the A/D (e.g. with 8 bits this gives resolution 

28 of approximately 1 part in 256 or about 0.4%). 

29 To insure both adequate resolution and dynamic range, the clock frequency 

30 (i.e. sampling rate) must be properly chosen. If too fast it will not span the full range 

31 of times needed for a sense amplifier to flip for all possible stored memory cell data 

32 values before hitting the maximum count; while if too slow the result will be poor 



36933 1 
08/04/97 



fO rC) 

1 resolution and tne risk of inability to discriminate between neighboring states. In 

2 order to provide some relationship with the memory cells* drive characteristics, in one 

3 embodiment the frequency of clock 205 is governed by a memory cell (or group of 

4 memory cells), set at an appropriate drive level. In this way, clock 205 tracks process 

5 variation and operating conditions (e.g. voltage and temperature), setting up the 

6 optimum clocking rate to span the cell's dynamic range and associated memory states. 

7 Although this embodiment is relatively simple and effective, it does have 

8 limitations by nature of its being dynamic. Time constants associated with word line 

9 and/or bit line delays and their variations contribute both relative and absolute error. 

10 For example, if word line RC time constants are long relative to ramp (or step 

1 1 interval) times, then there can be significant differences in the times in which cells 

12 along the word or steering line (or a single line serving as both the word line for 

13 selection and steering line for capacitive coupling) experience a given word line 

14 steering drive voltage. The consequence of this is that cells at different positions 

15 along such lines will respond at different times. Also, conversion from cell current 

16 drive to comparator trip time is not exactly linear, because the discharge rates and 

17 characteristics depend on the drive levels of the cell which will vary with the bit line 

18 bias level (with conduction tending to decrease as bit line voltage levels drop, 

19 stretching out bit line discharge time). Also, the bit line capacitance can have a 

20 significant voltage dependence arising from junction CV characteristics. This 

21 nonlinearity in comparator trip time results in nonlinearity in time in the separation 

22 of states and margins in going from the lowest to the highest charged memory states 

23 (whereas it is desirable to space the memory states evenly, charge-wise, to get 

24 maximum fit of states within the dynamic range and to have uniform margins). 

25 A second exemplary embodiment removes these limitations by using a static 

26 sensing approach utilizing current comparators, as shown in the exemplary 

27 embodiment of Figure 3. The fixed reference voltage, Vref, of the embodiment of 

28 Figure 2 is replaced with a staircase reference current (Iref) source 310, which starts 

29 off at a minimum level, Imin, and increments by a I with each count of clock 305 (i.e. 

30 after n clock pulses Iref =Imin+n* a I). For a given memojy cell, when the reference 

31 current just exceeds the cell current, the associated one of current comparator sense 

32 amplifiers 104-1 through 104-N will flip, freezing the corresponding count of counter 
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1 304 (which increments in sync with staircase current generator 310) into the 

2 corresponding one of registers. In one embodiment, the scale factor for staircase 

3 current source 310 (e.g. its maximum current) is established using one or a population 

4 of floating gate memory cells (e.g. erased strongly) in order to provide optimum 

5 dynamic range with tracking of process and operating conditions; i.e. the regulation 

6 of current source includes monitoring the characteristics of one or more floating gate 

7 cells dedicated for use in connection with this current source regulation. 

8 This second embodiment, while a bit more complex, offers better control, 

9 linearity and minimizes or eliminates sensitivity to dynamic effects. This includes 

10 eliminating the need for repetitive, controlled ramping of word lines in the case of 

11 dynamic sensing, simplifying many of the timing and associated control operations. 

12 Once sensing is completed and data is frozen into all registers 301-1 through 

13 301-N, it is shifted out, for example, serially. A simple way to do this is to have the 

14 registers 301-1 through 301-N tied together in shift register fashion. In the above 

15 example, the data stored in each register each comprises eight bits, requiring an eight 

16 line wide bus to shift the full data out of the memory chip (for example to a memory 

17 controller, such as is described in U.S. Patent 5,430,859 assigned to Sandisk 

18 Corporation, for sending to requesting devices) in one controller clock cycle, and thus 

19 requires eight output pads/pins. If data rate to the controller is less critical while 

20 keeping the number of pads/pins down is important, then the eight bits could be 

21 broken down, e.g. shifting out the four MSB bits first followed by the four LSB bits 

22 through four pads in two controller clock cycles, or shifting out groups of two bits 

23 four times through two output pads in four controller clock cycles, etc. 
24 

25 Tracking/Data Scrambling 

26 As previously stated, one goal of the present invention is to provide self- 

27 consistent, adaptive and tracking capability for sensing, capable of establishing both 

28 the data and the "quality" of the data (i.e. the margins). In accordance with certain 

29 embodiments of this invention, tracking cells are included within each of the sectors 

30 such as those described in U.S. Patent 5,172,338 assigned to Sandisk Corporation. 

31 These tracking ceils are set at known states to reliably establish the optimum 

32 discrimination points for each of the various states. In certain embodiments, this is 
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1 accomplished using as few as one cell per state. However, if better statistics are vital 

2 to establishing the optimum discrimination point, a small population of cells sufficient 

3 to establish such optimum points statistically is used. For example in one embodiment 

4 ten physical cells are used for each state, in which case for 4-state encoding a total of 

5 40 physical cells are used, as part of the overhead portion of the sector. 

6 As will be described below, data from these tracking cells will be the first 

7 information from the sector to be read into the controller, in order to establish the 

8 optimum discrimination points for the remainder of the sector data. However, in 

9 order to make these cells track the rest of the sectors in terms of data history and 

10 wear, they are not repeatedly erased and written into the same, fixed, pre-assigned 

11 states. This is because the amount of wear will be peculiar to that state and may not 

12 reflect the wear/history of the remainder of the sector. In one embodiment, managing 

13 wear, both in terms of insuring uniformity (i.e. intra-sector wear leveling) and in 

14 keeping such wear to a minimum, is handled by some method of continuous or 

15 periodic re-assignment of each of the logical states (e.g. logical states L0, LI, L2 and 

16 L3) to a corresponding physical state (e.g. physical states P0, PI, P2, and P3), an 

17 example of which is shown in Table 2. These physical states P0 to P3 correspond to 

18 specific conduction levels of each memory cell; e.g. P0 is the highest conducting 

19 state, PI is the next highest conducting state, P2 the next highest, and P3 the least 

20 conductive state. A description of this concept applied to two state encoding and 

21 termed "program/inverse program " is disclosed in U.S. Patent 5,270,979 assigned to 

22 Sandisk Corporation. 

23 Re-assignment of states with subsequent writes (in one embodiment with each 

24 subsequent write, and in alternative embodiments after a specific number of writes) 

25 is done, for example, by rotation or on a random number basis. This guarantees that, 

26 on the average, over many cycles, only about half of the full possible charge is 

27 transported to the cells, and that the wear of each cell is virtually identical to all 

28 others within its sector. The embodiment utilizing a random number assignment 

29 between logical and physical states has the advantage that it eliminates the possibility 

30 of synchronization between the logical to physical data re-assignment algorithm and 

31 variable user data, which would defeat such wear leveling. 
32 
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1 Table 2 

2 Logical State Physical State Assignment 

3 gl £2 SI #4 

4 LO PO P3 P2 PI 

5 LI PI PO P3 P2 

6 L2 P2 PI PO P3 

7 L3 P3 P2 PI PO 
8 

9 All tracking cells for each given logical state are re-assigned to the same 

10 physical state, e.g. all ten cells of one tracking group assigned with the role of storing 

11 logical state LI, are set to either PO, PI, P2 or P3, for a particular write cycle, 

12 dictated by the scrambling algorithm. Given that the tracking cells go through the 

13 same scrambling operation as the remainder of the sector, they not only reflect the 

14 wear of that sector, but also provide the translation means to convert back from 

15 physical to logical state. Since each tracking group is given a constant pre-assigned 

16 logical state responsibility, when the controller deciphers the various tracking cells 

17 groups (e.g. the four groups of ten cells each) it will concurrently establish the 

18 translation for the sector. 
19 

20 Resolution Requirements 

21 More resolution requires more time to sense (more steps in the A/D), more die 

22 area associated with the larger registers, more cost associated with shipping data out 

23 to the controller (more parallelism dictates more pads and thus an area penalty or, 

24 with same number of pads, takes longer to shift out all the data, and thus a 

25 performance penalty), and more cost associated with processing the data in the 

26 controller. Inadequate resolution results in limited visibility in common mode 

27 population margin shifts (e.g. due to trapping/detrapping effects), resulting in larger 

28 error in establishing comparator points. This larger error must be included x in the 

29 multi-state budget, forcing larger separation between states, and consequently fewer 

30 states, i.e. lower multi-state scalability. 

31 A reasonable resolution target is A/D resolutions equal to approximately 3% 

32 of the state-to-state separation. This provides visibility into sufficiently small cell 
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1 current shifts within a population to allow meaningful correction (i.e. avoiding margin 

2 failure from tail bits within a population due to poorer resolution), and does not 

3 impose such a high resolution that it becomes meaningless vis a vis the various noise 

4 and error terms associated with setting and measuring states. 

5 Specific examples for state ranges and counter/A/D resolution are shown in 

6 Figure 4a and 5b for 4-level and 8-level multi-state encoding, respectively. The cell 

7 current/floating gate voltage relationship used in Figures 5a and 5b for read are 

8 representative of cell characteristics built in accordance with the teachings of the 

9 present invention, using 0.5 micron based flash semiconductor fabrication technology 

10 available today, which for example has an I/V slope of approximately 20 uamps/volt 

1 1 with the zero current intercept (projected threshold) at 4.25v. 

12 In the example shown, the state-to-state separation for a four state cell is 

13 30uamps, the A/D resolution is luamps and the dynamic range covered is 0 to 

14 128uamps. This gives about a 1/30 resolution of the state to state separation (3.3 %). 

15 A population of cells written into a given intermediate state is confined to a lOuamp 

16 window, i.e. spanning ten steps of resolution. Therefore 1 A/D step bit offers a 10% 

17 resolution of the written population distribution, and any common mode shift of that 

18 magnitude, over time, can be corrected in 10% resolution steps. Therefore, for 4- 

19 state a 7 bit A/D is suitable. 

20 The situation is similar for the eight state example of Figure 4b, except state 

21 to state separation is 15uamps, and A/D resolution is 0.5uamps, covering the same 0 

22 to 128uamps dynamic range. This offers the same percentage of the population 

23 resolution, for which an eight bit A/D is suitable. 
24 

25 Adaptive Multi-State D iscriminafinn 

26 The following describes the data flow and handling by the controller for each 

27 sector read operation. In order to support high speed, in one embodiment this 

28 operation is performed in hardware and/or firmware. For the purposes of the 

29 following discussion, the example of 4-state encoding, with 7 bit sensing resolution 

30 (providing 128 steps on the order of 1 uamp per step) and Jen tracking cells for each 

31 of the four states, is used. Figure 4a depicts 4-state encoding with each bit of 

32 resolution corresponding to approximately 1 uamp (therefore about a 100 uamp full 
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1 range). In the embodiment depicted in Figure 4a, 4-states are shown, physical states 

2 PO, PI, P2, and P3. State PO is established by setting the cell to have a cell current 

3 under read conditions of 90 uamps or more (e.g. by erasing the cell to that value). 

4 When reading, state PO is detected when cell current is 85 uamps or more, thereby 

5 allowing a slightly relaxed tolerance for reading than writing. The programming 

6 levels for states PI, P2, and P3 are also shown in Figure 4a, as are the looser read 

7 current levels for each of those states. An appropriate guard band is placed between 

8 each state such that, for example, a cell current during read between 75 and 85 uamps 

9 is too ambiguous to be associated with either of adjacent states PO and PI. 

10 The operation of this embodiment will now be described with respect to the 

1 1 flowchart of Figure 5 and the diagram of Figure 6. First, the reference tracking cells' 

12 data is shifted into the controller, one 8 bit set (or byte) for each cell. This data is 

13 then processed as illustrated in more detail in the flowchart of Figure 7, starting with 

14 the first tracking cell group assigned to logical state LO as described in Table 2. The 

15 function of these bits is to establish the optimum compare point for the LO state by 

16 first establishing where the center of the population of tracking cells placed into the 

17 LO state is. This can be accomplished on the ten cells per state population by 

18 continuously summing each successive data of the ten LO cells, giving accumulation 

19 of those ten cells* data. It is desirable to maintain a max and min register 

20 concurrently, in order to minimize chance of error from an isolated, errant cell, either 

21 high or low. This is done by comparing each successive piece of data to the 

22 previously stored comparator data and at each compare operation storing the higher 

23 (lower) into the max (min) comparator. Once data from all ten cells have shifted in, 

24 it is processed to establish the filter point, for example by subtracting the max and the 

25 min from the sum and dividing the result by 8 (i.e. shifted to right three times), 

26 giving the average storage level of the LO assigned tracking cells. Rounding to the 

27 nearest number is, in one embodiment, accomplished by shifting to the right three 

28 times but temporarily storing the third bit shifted and then summing this bit with the 

29 shifted value. This is then repeated for the LI, L2 and L3 tracking cell population, 

30 at which point the system has determined the physical to logical conversion for each 

31 state. In one embodiment, this conversion is performed by ordering the LO, LI, L2, 

32 and L3 states into descending order, and then matching this to the corresponding 
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1 physical state assignment as shown in Table 2. For example, if LO happens to 

2 correspond to physical state PO it will have the highest value of the four states, if LO 

3 corresponds to physical state PI it will have the next highest value, and so forth, and 

4 likewise for states LI, L2, and L3. If after ordering the order is LO, Ll, L2, L3 then 

5 state assignment #1 of Table 2 was used. On the other hand, if the order is Ll, L2, 

6 L3, LO the assignment #2 was used, and so forth per Table 2. In this embodiment, 

7 the optimum discrimination points between the four physical levels, PO, PI, P2, and 

8 P3 are established by calculating the midpoints between PO and PI, PI and P2, and 

9 P2 and P3. Slightly better precision is achieved by postponing the division by 8 for 

10 the individual ten cell groups until after summing PO and PI, PI and P2, etc., at 

1 1 which point the average of PO and PI is obtained by summing PO and PI and dividing 

12 by 16 (shifting four to the right with provisions for rounding) and similarly for PI and 

13 P2, and P2 and P3, thereby establishing three compare values, CI, C2, C3, 

14 respectively, which are shown in Figure 4a as current points 80, 50, and 20 between 

15 states PO, PI, P2, and P3. 

16 This then gives the optimum compare or filter points for the rest of the sector's 

17 data, which is now shifted in. As data is passed through, it is sifted through a set of 

18 comparators (for example, as described later with reference to the flowcharts of 

19 Figures 5 and 7) set at those compare points to establish their state; i.e. higher than 

20 CI, (making it state PO), between CI and C2 (making it PI) between C2 and C3 

21 (making it state P2) or lower than C3 (making it state P3). These are then translated 

22 to their corresponding logical states, based on the specific logical to physical 

23 assignment used, as discussed above. In one embodiment, compare points CI, C2, 

24 C3, loaded into the comparators are adaptive in nature, established by the sector itself 

25 via the tracking cells. In this way the sensing tracks the properties of the population 

26 of cells within the sector, their operating voltage and temperature conditions, history 

27 and wear, and any common mode drift, as for example may arise from detrapping of 

28 gate oxide trapped charge, accumulated during write cycling. Since such detrapping 

29 is also present in the tracking cells, they establish the optimum point for sensing, 

30 whatever the degree of detrapping, provided their conduction remains within the 

31 dynamic range of cell state sensing capability (i.e. ability to still discriminate between 

32 the various states), and the mechanism is truly common mode, with minimal dispersion. 
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1 In one embodiment, this adaptive adjustment of the compare points is 

2 performed in a continuous, real time manner. In an alternative embodiment, the 

3 optimum compare points for the LO state as well as the other states L1-L3 are 

4 established periodically as part of a maintenance operation, and not in real time as 

5 actual data is being read, to reduce impact on system performance. This latter 

6 approach improves performance by eliminating the repetitive overhead time associated 

7 with processing the tracking cell data. In one embodiment, it is invoked on a 

8 predetermined read interval basis as part of a read/margins checkout, and/or invoked 

9 in the rare event of read marginality or failure. This gives the ability to recover data 

10 or restore margins through data rewrite using the most optimum read reference 

1 1 conditions via the tracking cells. 

12 In one embodiment, a sector is broken down as shown in Figure 6, to include 

13 user data and overhead bytes. The overhead bytes include a plurality of reference 

14 tracking cells for monitoring the condition of one or more cells known to be 

15 programmed to each of the logical states in the multi-state memory. The overhead 

16 also includes, if desired, header information such as address information, ECC bits, 

17 bit and/or sector mapping related information, and counts of the number of writes to 

18 the sector. Referring again to Figure 5, as the rest of the sector's data is read and 

19 processed using the compare points established based on the referenced tracking cells* 

20 characteristics, a decision is made as to whether the data is acceptable or not. If not, 

21 gross defect management is invoked, such as described in U. S. Patent 5,602,987. 

22 On the other hand, if the data is acceptable, a decision is made as to whether the data 

23 is "clean" , i.e. of a sufficiently high quality that there no data margin or ECC related 

24 problems. If the answer is yes, the data is sent out to the host without further 

25 intervention; conversely if the answer is no (i.e., the data is not clean), the necessary 

26 error correction or "clean up" step is invoked thereby not only sending the data out 

27 to the host but also insuring that the corrected data is clean upon subsequent reads. 
28 

29 Data Quality Assessment and Response 

30 As described above, one feature derived from this, invention is the ability to 

31 concurrently determine not only the data itself but also the "quality" of each data 

32 point, or its margin, with respect to the abbve described compare points. Even when 
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a bit of data is read correctly, if it gets too close to a compare point, it may become 
unreliable sometime in the future, giving erroneous readings due to noise sensitivity, 
additional margin shift, or change in operating conditions arising from power supply 
or temperature variation. Therefore, the quality measurement achieved by this 
invention provides a failure look-ahead capability, something dealt with in prior art, 
using special read-under-margin operations. Such prior art read-under-margin 
operations generally involve multiple pass reads, invoked under special conditions or 
circumstances, and requiring special circuitry (which may include controlled changes 
to reference/sensing circuitry or special cell biasing operation) to establish the needed 
margin differentials. Often, the accuracy or resolution of such differential means is 
limited, forcing larger margins than absolutely required. In the case of multi-state, 
this would dictate wider memory threshold voltage windows per state, and 
consequently wider voltage separation between states, thereby resulting in fewer states 
available for a given cell's dynamic voltage range, and consequent lower memory 
storage density per cell. However, with the novel approach of the present invention, 
the margin or "quality" of the data is a natural byproduct of each read operation, 
requiring no special modes or events to initiate it, and allowing the system to instantly 
react to any detection of marginal data. In essence, the capability of a "look ahead 
data recovery" is automatically included each read operation. However, instead of 
such margining operation being considered a very rare operation for a very rare event, 
in accordance with the present invention, the trade-off made in order to achieve high 
density multi-state is to allow a substantially higher incidence of such marginality, 
with such marginality being made manageable by providing a measure of this 
marginality as part of the standard read operation. 

In one embodiment, the specific way such marginality detection is implemented 
includes, around each of the compare values CI, C2, C3, an additional pair of values 
Cl+del, Cl-del, C2+del, etc., shown in Figure 4a as "poor margin filter", and 
associated comparators (not shown). Any data falling between the compare points CI , 
C2, C3 and their associated +/- del points is tagged as marginal (e.g. if state P2, 
which falls between compare values C2 and C3, is detected to be between C2 and C2- 
delta or C3+delta and C3, it is then tagged as marginal). Consequently, each piece 
of 4-state data can have a three bit result, the first two bits, A and B, for the actual 
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1 data and a third bit, Q, for its marginality or "quality" (e.g. 0 if OK and 1 if 

2 marginal), as depicted in Table 3. 

3 Table 3 

4 RESULTS NO MARGINALITY MARGINAL DATA 

5 PROBLEMS 
6 

7 A 0 0 1 1 0 0 11 

8 B 0 10 1 0 1 01 

9 Q 00 00 1 1 11 
10 

11 In one embodiment, the quality of the data includes additional information, for 

12 example whether the sensed parameter (e.g. cell current) is too high or too low with 

13 respect to the center of that stated population (e.g. for state P2, if found between C2- 

14 delta and C2 it is too high, whereas if between and C3 and C3+delta it is too low). 

15 This allows clean up reaction conditional on its direction of marginality. For 

16 example, if a memory cell's marginality is a consequence of being shifted towards 

17 being too heavily programmed, the course of action is to re-erase and program that 

18 data as is part of a full sector data scrub operation. On the other hand, if a memory 

19 cell's marginality is such that it is shifted towards being too heavily erased, recovery 

20 of proper margin for the state of the memory cell is accomplished by programming 

21 only that one memory cell slightly in order to regain its needed margin or "quality". 

22 An example of the latter is the case of relaxation of trapped channel electrons (which 

23 can accumulate after a large number of writes to a cell or a group of cells) which 

24 causes cell margins to drift from a more to a less heavily programmed condition. In 

25 such a case, it is sufficient to add some programming operations to regain cell state 

26 margins; no sector erase before programming is required. 

27 In one embodiment, a count is stored within each sector as part of the sector's 

28 header whose function is to be incremented each time a corrective action associated 

29 with a read scrub takes place. Once this count reaches a maximum allowed level, 

30 CMAX, the corrective action invoked is to map out the marginal/failing bits, whereas 

31 prior to reaching this CMAX value, data is rewritten without such mapping. This 

32 embodiment preserves the sector longer prior to the entire sector being retired from 

33 service, by avoiding nuisance marginalises resulting in excessive bit and sector 
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1 mapping, while filtering out the truly bad bits which should be mapped out. Once the 

2 CMAX count is reached for a sector and the failing marginal bit is mapped out, the 

3 counter is reset to zero and the procedure is repeated. 
4 

5 Multi-State Writing 

6 Writing the multi-state data is now described with reference to the exemplary 

7 circuit diagram of Figure 8 and the associated flow chart of Figure 9. With reference 

8 to Figure 8, the components located within the dashed line indicate components which 

9 are replicated for each sector. Following the data unconditional sector erase, data is 

10 written into that sector on a chunk by chunk basis. Starting with the first chunk, the 

11 first intermediate state, state PI, is placed into the programmed state, which is 

12 initiated by using a short, low voltage VCG pulse (for example approximately 4usec 

13 at 2v control gate bias) followed by a verify read against a reference current set at the 

14 level appropriate for state PI. For bits within the chunk targeted to receive this 

15 programming, but which become sufficiently programmed, an internal circuit locks 

16 out further programming of those bits, while targeted cells, still insufficiently 

17 programmed, experience the next programming pulse, which is of the same width as 

18 the first, but has incrementally higher VCG (e.g. 200 mV higher), again followed by 

19 verify. This sequence of programming with incrementally higher VCG followed be 

20 verify continues until all state PI cells targeted within the chunk are verified, or until 

21 a maximum VCG is reached (in which case defect management is invoked). Then the 

22 next intermediate state, state P2, is written, in similar fashion to the first intermediate 

23 state PI , but using the reference current setting associated with that state, and starting 

24 with a VCG level appropriate for reliably programming that state in the shortest time. 

25 This procedure is repeated for each state until all states in the chunk are programmed 

26 and verified, and the whole process repeated on the remaining chunks on a chunk by 

27 chunk basis. 

28 An alternative embodiment, depicted in the flowchart of Figure 10, provides 

29 an increase in speed. In this embodiment all states within a chunk of bits are 

30 programmed concurrently in a single VCG staircase progression as follows. The data 

31 to be written into the chunk is shifted into the corresponding registers (e.g. register 

32 43 of Figure 8), exactly mirroring the readout operation, and the corresponding bit 
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RS latch 46 is set enabling its associated bit line driver. Associated with each 
physical data state, PO, PI, P2, P3 is its register count and corresponding current 
level. After each programming pulse the reference current staircase is invoked in 
analogous fashion to the read operation, with the master counter concurrently 
incremented. A comparator circuit associated with each register (formed of transfer 
gate 41 and XOR gate 42) compares the input data (i.e. count) stored in register 43 
to that of master counter 44. When a match occurs, the program lockout feature upon 
verify is enabled. Actual lockout only occurs when the corresponding cell is 
sufficiently programmed to pass read verify with respect to the associated reference 
current setting, (i.e. programmed into the associated physical state). Once verify is 
successful, NAND gate 45 resets RS latch 46, disabling its associated bit line driver 
47, and resulting in all subsequent programming of that cell being disabled for the 
remainder of the sector write operation. If verify fails, the cell will receive the next 
VCG incremented programming pulse followed again by the scanned current 
source/master counter verify procedure. 

Unlike reading, which calls for use of the entire current staircase to resolve the 
state to full analog precision, the write/verify operation only needs to use those 
reference current settings and associated counts specific to the set of memory states, 
e.g. specific to states PI, P2, P3 as predefined (PO, being the erased state, is excluded 
and inhibited from programming from the outset). This helps speed up the verify 
process by having three settings in the case of 4-states, in place of 128 settings 
exemplified for the read operation of Figure 4a, where 128 settings allows for quality 
determinations to be made. Therefore, as illustrated in the example of Figure 10, 
each verify consists of a three step staircase operation in which the first step consists 
of setting up (e.g. rapidly incrementing up to) the first reference current level 
associated with physical state PI, including concurrently setting up the master counter 
(e.g. counting) to the corresponding counter value, performing a read/sense operation, 
and locking out from further programming any cells which both match their register 
value to that of the master counter and are read as programmed (with respect to the 
corresponding reference current setting). Each following step of the three step 
operation consists of setting up (e.g. rapidly counting up to) the next data current level 
and corresponding reference current setting and repeating the read/sense operation, 
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1 identically to first step, until all three steps are co 

2 Note that it may not be necessary to have a full match of the 8 bits, only that 

3 a sufficient number of MSB (most significant, or of highest current weight bits) 

4 match. This is most applicable when there are much fewer allowed states and 

5 corresponding cell current targets than resolution of the A/D. In this case, as long as 

6 the MSB bits uniquely differentiate each of the various states (e.g. there are a 

7 minimum of two MSB bits for 4 state and 4 MSB bits for 16 states) only those MSB 

8 bits are required for the exclusive OR. This will save some area associated with 

9 exclusive OR circuitry, but does restrict somewhat the current assignment flexibility 

10 for each state. 

11 This program/3-step verify procedure is repeated, with VCG incremented in 

12 each subsequent program step, until all cells in the chunk are verified or max VCG 

13 level is reached, as described previously. This entire operation is then repeated for 

14 all remaining chunks of the sector, at which point sector multi-state date writing is 

15 complete. 

16 A significant advantage of this novel approach is that it can be extended to a 

17 large number of multi-states (e.g. 16) without substantially impacting write 
performance, other than that required for improved resolution (e.g. more and smaller 
VCG steps, or lower drain programming voltage VPD, to slow down programming 
rate), and the additional time needed to sense/verify each of the additional states. The 
latter, being a read operation, tends to be much faster than programming, and 
therefore should not substantially impact write performance. 

An alternative embodiment which speeds up the verify process is depicted in 
the diagram of Figure 11. In place of the single adjustable reference current source, 
multiple current sources (or parallel tap points of a master current source) are used. 
In one embodiment, the number of current sources is (n-1), where n is the number of 
states, since a current point is not needed for the fully erased state. A data-in register 
of size K is used for each cell in the chunk, where 2*K=n. The information written 
into the data register by the controller at the start of write is used to select one of the 
n-1 current levels during verify, dependent on the particular state. Upon verify, all 
cells of the chunk are compared simultaneously to their corresponding particular 
reference target in a single verify operation, locking out further programming, on a 
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1 cell by cell basis, if successful. This allows full verify to complete in one parallel 

2 operation, as opposed to the multi-step serial operation in the previously described 

3 embodiment, substantially improving verify speed. The cost is the requirement of the 

4 multi-current sources, counting and associated selection circuitry within each bit of the 

5 chunk. As in the multi-step embodiment, the requirement of data-in register can be 

6 served by a portion (e.g. the MSB portion) of the existing readout register. The 

7 exclusive OR used in the embodiment of Figure 8 is now replaced with straight 

8 decoding to select the appropriate current source. 
9 

10 Twin-Cell Relaxation Alarm 

11 An additional feature of the adaptive multi-state discrimination sensing of the 

12 present invention is the ability to put bounds to extreme states, an upper bound for the 

13 highest state (e.g. physical state P0) and lower bound for the lowest state, assuming 

14 that this lowest state is not already in cutoff. When the extreme states (as for example 

15 reflected within a subset of the tracking cells) cross those bounds, the data is deemed 

16 to be outside the limits of safe detectability vis a vis available dynamic range, and 

17 sector data either needs to be refreshed (rewritten) or the sector mapped out, replacing 

18 it with a spare sector. However, this does not eliminate the need for maintaining a 

19 cumulative count of the number of write operations experienced (referred to as "hot 

20 count") per sector, since there is no warning at the time of writing that, once written, 

21 such excessive shift may occur. Such warning is the function of a "hot count ceiling"; 

22 to put an upper bound to the amount of cumulative cell wear allowed, forewarning the 

23 possibility of excess trapped charge and associated margin loss due to its subsequent 

24 detrapping, termed relaxation. If such relaxation exceeds a critical value, the resulting 

25 common mode shift of all cells (noting that some form of data state rotation is being 

26 used to keep wear on all cells within the sector uniform) within the sector, typically 

27 from less conductive to more conductive levels, becomes sufficiently large to prevent 

28 discrimination between the highest two states (fully erased state and state just below 

29 it); i.e. drift exceeds dynamic range of the system. In order to avoid such failure, 

30 sectors cycled to such high trapping levels must be retired. 

3 1 The hot count is an indirect indicator of such trapping, since in addition to the 

32 number of cycles experienced, cumulative trapping is sensitive to other factors such 
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1 as duty cycle ofCZ^Vrite operation, time between write^Z>iL. Jting and non-operating 

2 temperature exposure, etc.; i.e. history/details. When hot count is used as criteria for 

3 mapping out a sector, it must assume worst case conditions to insure no failure. 

4 However in practice, systems using such memories rarely, if ever, experience such 

5 worst case history exposure under actual application. Therefore, mapping out of a 

6 sector based on cumulative hot count is often excessively premature for practical 

7 applications. 

8 An alternative embodiment uses a "Twin-Cell" trapping gauge included within 

9 each sector, whose function is to detect directly the amount of channel trapping shift 

10 which is responsible for the relaxation. This provides a direct measure of the amount 

11 of wear actually seen by cells in the sector, comprehending both cumulative write 

12 cycles or hot count and history of sector exposure. Only when this cell's shift reaches 

13 a critical value will the sector be retired, and no hot count information is required to 

14 make this decision. This allows much higher endurance capability in actual system 

15 use than can be safely provided via hot count because, unlike hot count which can 

16 only provide a general indication of cumulative wear (since it cannot gauge wear 

17 directly, only exposure), and therefore the hot count must be heavily guardbanded 

18 (i.e. allowing minimum number of writes to accommodate worst case wear), the twin 

19 cell's direct measure of wear can minimize the amount of such endurance guardband. 

20 One embodiment of a Twin-Cell of the present invention is depicted in Figure 

21 12 and, consists of a cell 600 having a single floating gate 601 but two separate 

22 sensing channels, one channel 602 being a read/write channel (R/W), the other 

23 channel 603 being a read-only (RO) channel. Cell 600 is designed to match actual 

24 memory cells, e.g. by taking two adjacent memory cells and tying their floating gates 

25 together. Programming of cell 600 is performed through the read/write channel by 

26 raising bit line BL2 to a programming voltage (for example about 7v), and grounding 

27 bit line BL1, while bit line BL0 is floated (or grounded). In this way, all the stress 

28 and trapping associated with hot electron programming is confined to the read/write 

29 channel 602. Using the A/D read of read/write channel 602 followed by AID reading 

30 of read only channel 603 and finding the difference (e.g. by subtracting) gives a 

31 measure of channel trapping (delta). Early in a sector's life, with low cycling 

32 exposure, this delta is close to zero, while with progressive cycling the difference 



56933 1 
08/04/97 



C0 oO 

1 grows, with the^read only channel 603 giving higher K/D counts (appearing more 

2 erased) compared to read/write channel 602. 

3 The state set and used for useful comparison is, in one embodiment, a middle 

4 intermediate state, offering both the widest range and the average wear of a cell. 

5 When the delta exceeds a critical value (e.g. 20 counts in example of Figures 5a and 

6 5b, corresponding to a cell current shift of 20uAmps and lOuAmps for the four and 

7 eight state encoding, respectively) the sector is at its limit with respect to 

8 wearout/relaxation or other potential read and reliability problems and is retired. 
9 

10 In summary, key points described thus far in this specification for supporting 

11 high density multi-state are: 

12 1. Parallel, full chunk, A/D conversion of multi-state data, with adequate 

13 resolution to provide analog measure of the encoded states; 

14 2. Master reference cell(s) whose prime function is to provide optimum 

15 dynamic range for comparator sensing; 

16 3. Logical to Physical Data scrambling to provide both intra-sector wear 

17 leveling and increased endurance capability of about twofold. 

18 4. Intra-sector tracking cell groups, one for each state, included in each 

19 sector to provide optimum compare points for the various states, and able to adapt to 

20 any common mode shifts (e.g. relaxation). It also provides translation of data 

21 rotation. 

22 5. Controller incorporating a data processing "engine" 

23 a) to, on-the-fly, find midpoints of each tracking cell group, 

24 b) with which to establish data state discrimination and marginality filter 

25 points, 

26 c) through which sector data is passed, giving both the encoded memory 

27 state, and its quality (marginality), for each physical bit, 

28 d) optionally, to decide what actions must be taken to clean up (scrub) 

29 marginal bit data based on the quality information (e.g. do full sector erase and 

30 rewrite versus selective write, only). 

31 6. Optionally to include a small counter on each sector which is 

32 incremented each time a read scrub is encountered. When the count reaches 
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1 maximum allowed, marginal bit(s) are mapped out rather than rewritten and counter 

2 is reset to 0- This provides a filter for truly "bad" bits. 

3 7. Same means are applied in reverse to write multi-state data back into 

4 a sector, using the same circuitry as used for read but operated in reverse, to provide 

5 self-consistent data encoding. In addition, two alternative embodiments for 

6 performing verification are taught: 

7 7a. Using a reference current staircase to sequentially scan through the 

8 range of states, conditionally terminating each cell as the current step corresponding 

9 to its target data is presented to the sensing circuit. 

10 7b. Using a full set of N-l reference currents of the N possible states to 

11 simultaneously verify and conditionally terminate all cells. 

12 8. Twin-cell option can be included in each sector to provide deltaVt shift 

13 level associated with cycling driven trapping and channel wearout, triggering sector 

14 retirement before detrapping shifts exceed read dynamic range or other potential read 

15 errors. This replaces hot count based sector retirement, greatly increasing usable 

16 endurance. 
17 

18 Enhancing Multi-State Speed bv Utilizing Column Oriented Steering 

19 An important goal for multi-state is achieving competitive speed to two-state 

20 devices, with respect to both write (data programming) and read. The reason that 

21 maintaining comparably high performance is difficult for multi-state, as compared to 

22 binary encoded data, originates from the considerably tighter margin requirements 

23 associated with multi-state encoding (given a limited total memory window budget), 

24 coupled with the fact that the information content per cell increases only 

25 logarithmically for a linearly increasing number of multi-state levels (i.e. 2 n levels 

26 gives only n bits of information). So along with margins, performance becomes a 

27 victim of the diminishing returns associated with increasing levels of multi-state. 

28 In the embodiment discussed above with reference to Figure 10, write 

29 performance is heavily impacted by having to progressively and carefully go through 

30 each state, the progression requiring a sequential, multiple pulse/check methodology 

31 to carefully set the state, although in several, embodiments verification speed can be 

32 increased, as discussed above. For example, to implement 4-state: erase sets up 
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1 physical state PO;-a first VCG staircase of up to 7 pulse/cneck steps sets up physical 

2 state PI; followed by a second group of up to 6 pulse/check steps to set up physical 

3 state P2; terminated with a last programming step to set up physical state P3; giving 

4 a total of 14 pulses to write two bits of information, 7 pulses per bit, in place of the 

5 one pulse per bit for writing binary. Projecting this to 8 level multi-state, the total 

6 number of pulses would be more than 30, a further slowdown to more than ten pulses 

7 per bit. 

8 Thus far, read performance has not been impacted for two reasons. The first 

9 is the feature of concurrent multi-state sensing using multi-leg cell current mirroring 

10 to n-1 sense amps (e.g. three sense amplifiers for 4-state). The second is the stream 

1 1 read feature appropriate for mass data storage, wherein, other than latency, the actual 

12 cell read time is hidden by the stream read implementation which simultaneously shifts 

13 out a large chunk (e.g. 256 bits) of previously read data while current data is being 

14 sensed. 

15 For more aggressively scaled multi-state implementations, both of the above 

16 features will become inadequate. With respect to the first, the use of static current 

17 sensing becomes increasingly unattractive, both because of increasing IR drops with 

18 physical scaling and increased memory window requirements while sensing margins 

19 decrease, and because of the higher power consumption associated with high value 

20 multiple current levels. A more attractive way to sense multi-states is via voltage 

21 margining, which requires only minimal cell current (as for example using dynamic 

22 type sensing), but dictates stepping through the range of control gate voltage margin 

23 levels spanning the states (for n states, this means a minimum of n-1 steps), an 

24 example of which is given in the above referenced analog dynamic-type sensing 

25 embodiment. This impacts the stream read feature however, because now the time 

26 consumed in actually stepping through the various margin levels, followed by sensing, 

27 increases greatly. When combining this with progressive demand for higher-still data 

28 rates in mass storage, it will become increasingly difficult to exploit stream read to 

29 achieve enhanced performance. In addition, write performance can also be 

30 significantly impacted by internal read speed limitations t since read is an integral 

31 component in reliably setting the individual states (via program/ verify loops), as well 

32 as for post write sector data checking. 
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1 So with more aggressive use of multi-state for L^liiig, based on the above 

2 scenario, performance will continue to decline. The above referenced analog sensing 

3 embodiment improves performance by supporting a large degree of parallelism 

4 Greater parallelism is one way to retard the decline in performance associated with 

5 increasing numbers of cell states. However, the use of a virtual ground array 

6 (imposing a separation between simultaneously addressable cells) plus the constraint 

7 of a 512 byte sector size granularity, places a limit on how far parallelism can pushed. 

8 The embodiments of this invention described in the following section offer a 

9 solution to the above performance limitations, by substantially cutting down the 

10 number of discrete steps required for both programming and read, while preserving 

11 the desirable features associated with analog/ voltage margin sensing taught by the 

12 present invention. 

13 Given that a dominant controlling element allowing differentiation between the 

14 various multi-state levels is the control gate (or equivalently termed steering gate), the 

15 key to reducing the number of discrete steps used for both read and write is to 

16 simultaneously apply, to the full group (chunk) of cells, control gate voltage values 

17 associated with each cell's particular data state requirements, on a cell by cell basis. 

18 In a row oriented sector, in order for the control gate to be individually 

19 adjustable for each cell, it cannot run in the row line direction, since it then becomes 

20 common to all cells which are to be simultaneously operated on. Rather, it needs to 

21 run in the column (bit line) direction, which allows it to both be individually 

22 adjustable on a cell by cell basis, and individually responsive to the sensing result on 

23 the associated cell bit line. The basic elements of one embodiment of such a cell are 

24 shown in Figure 13. Since control gate 71 runs parallel to bit lines 72-1 and 72-2, 

25 control gate 71 cannot also serve as the select line (which is the usual case in EPROM 

26 and FLASH memories), since unique cell selection along a bit line dictates that the 

27 select line run perpendicular to the bit line. This forces the select line to run in a 

28 different layer, which in one embodiment is a poly3 line with the control (steering 

29 gate) being a poly2 line and the floating gate built from polyl. Specific exemplary 

30 embodiments of cell structures suitable for use in conjunction with this aspect of the 

31 present invention are described later. 
32 
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1 Cell Read CtoerGfr C J - j 

2 A cell as in Figure 13 is read using the control gate in an A to D type binary 

3 search, as illustrated in the exemplary embodiment of Figure 14, and the flowchart 

4 of Figure 15. Each sensing circuit consists of Sense Amplifier (SA) comparator 81, 

5 having one input lead which receives an input signal from memory cell 99 via bit line 

6 82-2, and another input lead receiving an input signal from a global reference circuit 

7 (not shown) which provides reference signal Iref. The output of comparator 81 is 

8 used to update a corresponding n-bit Control Gate Register Element (CGRE) 83, the 

9 number of bits governed by required sensing resolution (e.g. if a 1 in 64 resolution 

10 is desired, a six bit register is used). The value stored in CGRE 83 is then used to 

11 provide the next control gate read VCG voltage, via the corresponding Next Step 

12 Processor (NSP) 84, in a successive approximation scheme. 

13 Following is an example of the read operation flow, as depicted in the 

14 flowchart of Figure 15. CGRE 83 is a 6-bit binary register element, with a 

15 corresponding dynamic range on the control gate (via NSP 84) of Ov to 7.875v in 

16 125mv steps. Read starts with the binary value 100000 (Nold) loaded into the CGRE, 

17 giving the midpoint VCG of 4v. The output from sense amp 81 is then fed back into 

18 control gate register 83, via Conditional Element 89, according to the relation: 

19 Nnew = Nold + Output*DN ; 

20 where (for flowcharting convenience) Output is defined as: 
21 

22 -1 if Icell > = Iref, and 

23 +1 if Icell < Iref; 

24 and where DN = 010000, giving next CGRE (or VCG) of: 

25 010000 (or 2v) if Icell > = Iref, and 

26 110000 (or 6v) if Icell < Iref. 
27 

28 In this way, if cell current is higher than Iref, the next VCG will be lower, 

29 reducing the cell current. Along with this next VCG, the next Nnew = Nold and the 

30 next DN=DN/2 are generated by Next Step Processor 84. This binary search 

31 continues five more times (for a total of 6 passes), wherein the last CGRE 83 value 

32 becomes the digital equivalent of the floating gate memory state. If the memory cell 
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1 uses an 8-level\diree logical bits/cell) multi-state encx>om£/this gives three bits of 

2 resolution between states for state-to-state discrimination, guardbanding, margining, 

3 etc. Data can then be processed in ways similar to those described in the afore- 

4 referenced Analog Sensing embodiment, the difference here being the rapid binary 

5 search methodology (as opposed to one-step-at-a-time sequential search), which for 

6 1 in 64 bit resolution represents a 10X performance improvement (six steps in place 

7 of a possible total of around 64 steps). 

8 In one embodiment, sensing is extended to a full chunk of bits (e.g. 128 bits 

9 per chunk), wherein each sensing circuit contains its own corresponding SA, CGRE, 

10 and NSP elements, as is depicted in the embodiment of Figure 16, in which the 

11 operation of each sensing circuit is conditional on its corresponding memory cell. In 

12 this way, the strength of the binary search approach is exploited to recover most of 

13 the lost read performance. For example, comparing the above example to a two-state 

14 read, assuming that each individual step of the binary search takes a comparable 

15 amount of time as that of the two-state sensing, then the total time expended in the 

16 multi-state read is equal to 6 binary reads. For 8-state encoding, three bits of 

17 information are extracted, resulting in a read time per logical bit of only twice that of 

18 binary state reading. Given that margin information is concurrently available as well 

19 (as described above), this offers an excellent level of read performance, consistent 

20 with a stream read implementation. 
21 

22 Cell Programming Operation - Programming Phase Specific 

23 In certain embodiments, the same elements used for reading are also applied 

24 to accelerate multi-state programming, again optimized to the targeted memory state 

25 on a cell by cell basis, as illustrated in the example of Figure 17. Here, the CGRE 

26 X83 is initialized with the optimum safe starting value for the particular state (this 

27 may come from a set of updatable parameters stored within the sector). In memory 

28 cells whose magnitude of programming (e.g. programming Vt) increases with 

29 increasing VCG, this optimum safe starting point is the highest value of VCG 

30 allowable that will not cause the memory cell to program excessively, overshooting 

31 its targeted state (i.e. overshooting its allowed state range). Starting at lower values 

32 than this optimum value, while safe, costs more programming time, because the 
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1 earlier progra mY Mg pulses do not provide a sufficienk^a fe ifitude of programming 

2 towards the targeted state, thereby decreasing write speed. In one embodiment, a 

3 different relationship of VCG with CGRE from that of read is used to satisfy dynamic 

4 range for programming (e.g. by adding constant voltage Kprog as indicated in the 

5 exemplary embodiment of Figure 17). Following each programming pulse, a verify 

6 operation is performed. In the class of cells described above, if programming margin 

7 target is not achieved, the CGRE value is incremented by 1, with a corresponding 

8 incremental voltage increase on VCG via NSP element 191 for the next programming 

9 step, whereas if margin is reached, further programming on that bit is locked out, by 

10 disabling further application of programming voltage on its associated bit line and 

1 1 optionally eliminating application of VCG as well. 

12 In one embodiment, this operation is performed simultaneously on all bits 

13 within the chunk, each bit starting at its optimal VCG, conditional on its 

14 corresponding to-be-programmed data. In this way, programming is completed in 

15 about six steps, relatively independent to the level of multi-state (e.g. 4, to 8, or 16 

16 level multi-state cells are, in accordance with this embodiment, programmable in a 

17 comparable number of pulses), in place of the more than 30 programming steps 

18 indicated earlier for a fully sequential 8-level multi-state programming embodiment. 

19 This not only represents a 5X write speed improvement, but given that three bits are 

20 being encoded, this gives an effective number of programming/verify passes of two 

21 passes per bit, only twice that of binary encoding. Since performance of a full write 

22 operation includes additional time overhead above and beyond program/ verify, this 

23 smaller difference in program speed may translate, in practice, to only a minor 

24 reduction in overall write speed as compared to binary encoded writing. 
25 

26 Cell Programming Operation - Verify Phase Specific 

27 Cell verify can also be made state specific, using the same CGRE/NSP engine 

28 described above with reference to Figure 17, by loading the targeted verify voltage 

29 (i.e. that value corresponding to the to-be-programmed data) into its associate CGRE. 

30 In this embodiment, unlike the read operation, for which VCG is changed during the 

31 read binary search flow, during the verify operation the state specific VCG verify 

32 voltage is kept fixed during the full program/Verify flow (i.e. NSP for verify remains 
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1 unchanged). In mis way, all cells within a chunk are verified simultaneously, with 

2 further programming locked out, on a cell by cell basis, as each cell passes the verify 

3 operation. 

4 This data conditional, high performance verify embodiment complements the 

5 above described high performance, data conditional programming embodiment, 

6 offering a highly parallel, fast speed methodology for setting a many level multi-state 

7 memory. In one embodiment, in order to better exploit this capability, two different 

8 CGRE/NSP circuits are used, as illustrated in Figure 18. CGRE/NSP circuit 91 is 

9 used to support programming, and CGRE/NSP 92 is used for verify, allowing these 

10 two circuits to be multiplexed at high speed onto the control gate when changing 

11 between programming and verify operations. 

12 Although using the individual, cell by cell VCG supply as in this embodiment, 

13 offers an excellent approach to supporting a high level of multi-state at high speed, 

14 it puts the burden on quickly providing all these VCG voltages. In one embodiment, 

15 all the possible voltage steps are generated and available simultaneously on a bus of 

16 voltage feed lines. In this embodiment, each CGRE value is used to decode which 

17 one of these feed lines to connect to its corresponding control gate. This embodiment 

18 is attractive when there aren't too many VGC levels to manage. Since in principal 

19 only seven compare points are needed for discriminating 8 states (and only 15 

20 compare points are needed for discriminating 16 states), this will often be suitable. 

21 However, this limits the high speed flexibility to dynamically tune the sense points and 

22 determine margins. If the need for attaining such foil resolution is very rare (as for 

23 example when ECC indicates a memory state failure or a marginality problem), an 

24 alternative, hybrid embodiment is provided which only demands such capability rarely 

25 (e.g. on the rare ECC flag). On those rare occasions, those compare points are 

26 incrementally shifted to folly resolve the margins, albeit via a more time consuming 

27 procedure, because now voltage values will need to be provided which are not 

28 included in the limited set of supply levels (e.g. 7 to 15 levels) concurrently available. 

29 This would dictate temporarily generating new voltage levels, not concurrently 

30 available, consuming more time, and potentially breaking^ up the concurrent parallel 

31 chunk operation into operations on individual bits or small groups of bits to feed these 

32 specialized voltage levels. 
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1 In the cJL ^fiere a large number of VCG voltagl^Zililities and/or all VCG 

2 voltage possibilities are required (i.e. full real-time margining capabilities for full 

3 dynamic range flexibility), one alternative embodiment, similar to the embodiment of 

4 Figure 17, expands the CGRE X83 and NSP 191 elements to include sample-and^hold 

5 circuitry for each sensing circuit, the complement of which are fed by a common, 

6 single staircase voltage source. The voltage delivered by each NSP is conditional on 

7 its corresponding stored CGRE value. Care must be taken in such an embodiment to 

8 ensure that the dynamic nature of sample and hold circuitry with its potential for drift, 

9 and the time requirements for scanning/sampling the full dynamic voltage range, do 

10 not cause programming voltage Vpg error. The benefit of this embodiment is that it 

11 incurs less area and power penalties. 

12 It is desired to simultaneously process each of the CGRE data, based on the 

13 associated sense amplifier result and the previously stored value (as well as the step 

14 in progress in the case of read), conditional on the operation in progress. This is most 

15 complex for read, involving the manipulation for successive approximation (basically 

16 providing up/down counting function, conditional on sensed result and current 

17 iteration step). For programming and verify its requirements are simpler, complexity 

18 coming primarily in initializing each of the CGREs to the corresponding data values; 

19 once initialized, nothing further is required for the verify, requiring only incrementing 

20 by one for each successive programming/verify step in the case of programming. 

21 Notwithstanding these complexities, required circuit areas and complexity of circuits 

22 should not differ substantially from approaches which use multiple sense amplifiers. 

23 The prior art approach uses multiple sense amplifiers (e.g. requiring up to seven sense 

24 amplifiers for 8-level multi-state). In accordance with this embodiment, the multiple 

25 sensing circuits and associated current mirrors and reference legs are now replaced by 

26 one sense amplifier circuit, a couple of registers with associated decoder functions, 

27 sample and hold circuits, and some glue logic. 

28 The other major element of complexity is that of shifting out and processing 

29 the large body of data stored in the chunk-wide CGRE register. One embodiment 

30 used is similar in this regard to that described in the above-referenced analog sensing 

31 embodiment. 
32 



56933 1 
08/04/97 



1 Exemplary Cell( Atoodiments 



2 Firstly, independent of other considerations, a memory cell must be 

3 competitive with respect to physically small size and scalability. Beyond that, 

4 however, based on the cell requirements described above for a row selectable but 

5 column steerable element, as represented in the example of Figure 13, the choices are 

6 limited. 

7 Furthermore, in order to realize such a cell/array in minimal area, it must 

8 incorporate virtual ground architecture, and this is not just because of the 

9 approximately 50% additional area associated with using the conventional 1/2 contact 

10 per cell array. The joint requirement of bit line and steering line running in the same 

1 1 direction, with the bit line having to physically run above yet periodically dropping 

12 below the steering line to contact diffusion, dictates that they run side by side rather 

13 than be stacked. Whereas this occurs naturally in the virtual ground array, wherein 

14 active transistors are laterally displaced from the bit lines, in the conventionally 

15 contacted cell array the active transistors, while displaced from the bit line contacts 

16 themselves, do lie directly below the bit line conductor. For this reason, 

17 select/steering functions in such arrays are generally row oriented, eliminating the 

18 conflict. To do otherwise further increases cell area. 

19 One memory cell which meets all the above requirements is the virtual ground, 

20 split gate cell having column oriented poly2 steering gates and row oriented poly3 

21 select gates. For reference purposes this will be referred to as cell embodiment 1. 

22 Such a cell can be programmed using either conventional drain side programming, or 

23 source side programming, depending on whether the poly3 select transistor is strongly 

24 turned on or throttled down, respectively. Erase is also row oriented, using poly3 as 

25 the erase line, thereby achieving the row oriented sector. The source side 

26 programming version of this is described in U.S. Patent number 5,313,421, assigned 

27 to Sandisk Corporation. For reference purposes, this version will be referred to as 

28 cell embodiment la. 

29 Another suitable cell is the dual floating gate variant of cell embodiment la, 

30 such as is described in copending U.S. patent application serial no. 08/607,951 filed 

31 February 28, 1996 and assigned to Sandisk Corporation/which offers a true cross- 

32 point cell (4*lambda 2 per physical bit). For reference purposes this version will be 
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1 referred to as cC^^bodiment 2. However, because ^^TO^eries nature of the tri- 

2 gate structure (the two floating gate channels being in series), it is constrained to using 

3 source side programming, and will be more limited in how many levels of multi-state 

4 are realizable. Nevertheless its inherently smaller cell size, self-alignment features 

5 and consequent scalability make it equally attractive to the simpler but somewhat 

6 larger cell embodiment la. 
7 

8 Column Pitch/Segmentation Options 

9 Because of the requirement within each cell to have both bit line and steering 

10 line (control gate) running parallel to each other (for convenience, their direction 

1 1 henceforth defined as vertical), this raises the question of bussing/pitch requirements. 

12 To achieve a physically minimal cell, this dictates that the lateral extent (horizontal 

13 width) of the cell must be close to minimum feature pitch (i.e. about 2*lambda), 

14 forcing the above two lines to fit in that pitch. At the cell level this is not a problem, 

15 since the steering line and bit lines tend to run side by side, and more importantly they 

16 are on different layers (poly 3 and BN+, respectively) eliminating proximity /overlay 

17 constraints. However, going from the local to the global interconnect level is a 

18 challenge. 

19 For ultra high density Flash memory, one way to interface long bit line 

20 columns to the memory cell array is via column segmentation. This approach uses 

21 the continuous (vertically) running metal lines as global bit lines, which drop down 

22 periodically to local diffusions serving memory sub-arrays or "segments" (e.g. 16 

23 sectors) via segment select switching transistors. In this way array segments are 

24 isolated from one another, eliminating the large cumulative parasitics of leakage 

25 current and capacitance, and providing column associated defect and repetitive disturb 

26 confinement. This also provides opportunity for relaxing the pitch requirement of the 

27 global bit lines from one per cell to one per two cells, depending on the segment 

28 selection approach used (e.g. U.S. Patent 5,3 15,541 assigned to Sandisk Corporation). 

29 With respect to the steering line, first consider the cell/array using cell 

30 embodiment 1, which requires one steering line per column of cells. One possibility 

31 is to have this be a continuous global line, i.e. running continuously (vertically) 

32 through the entire memory array. Running through the memory cell sub-array portion 
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1 poses no obstacJU^iadily fitting within the existing pitJCXiowever, it may fun into 

2 obstacles when trying to cross the segment select portions, which bound those sub- 

3 arrays. Other issues with this approach are the associated large RC time constants 

4 (impacting speed of charging and discharging a long, resistive line), and the increased 

5 array exposure to repetitive disturb. 

6 For those reasons, segmentation is also desirable for the steering function. 

7 Consequently, given that at most one metal line can be run in the pitch of one cell, 

8 both global metal bit lines and global steering lines can be shared between pairs of 

9 cells. Such sharing in the case of a global metal bit line is described in the above 

10 referenced U.S. Patent 5,315,541. It uses a staggered, interlaced segmentation 

1 1 architecture "with a transfer network driven by four decode lines per segment pair, 

12 thereby allowing each metal bit line to run in the pitch of two cells. 

13 Similar sharing can also be achieved for the steering lines, an example of 

14 which is shown in Figure 19 (and this is only one of many possible configurations). 

15 In this embodiment, there are four steering transfer lines driving the transfer matrix, 

16 with one global steering line per two cell columns within the segment. When cells are 

17 selected, the steering transfer network connects the corresponding local steering lines 

18 to unique global steering lines (e.g. Sk connected via SDTI4)). Each selected global 

19 steering line is connected in turn by the chunk select (i.e. column or y-select) circuitry 

20 to the CGRE circuitry. 

21 Those steering lines which are not currently active may be floated or held at 

22 ground. If grounded, this raises the possibility of having a subset of the local steering 

23 lines, associated with a subset of cells which are not being operated on currently, to 

24 be held at ground through appropriate enabling of other SDT lines. An example, 

25 referring to Figure 19: Let Sk be the selected global steering line, and SDTI4 be the 

26 selected transfer selected line. If it is not desirable to have steering potential applied 

27 to unselected cells on the selected row, SDTI3 should be held at ground. However, 

28 both SDTI1 and SDTI2 can be turned on allowing the neighboring cells on either side 

29 of the selected cell to have grounded steering lines. 

30 The reason that it may be undesirable to have unselected cells on selected rows 

31 receive high steering potential comes primarily during programming, when channels 

32 are conducting. Even here however, the bias conditions on unselected cells are 
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1 interchanged source and drain, and see lowlZil^n to source potentials, 

2 eliminating parasitic programming. Given this, in another embodiment, the four SDT 

3 select lines per segment are replaced with a single SDT line, simplifying decoding, 

4 and potentially reducing layout area (although because of narrow cell pitch, area 

5 reduction is primarily governed by select transistor and vertical interconnect related 

6 dictates). 

7 Having floating local steering lines (e.g. in all the unselected segments) does 

8 raise issues. It is undesirable that any of these lines drift to or are left at such a high 

9 potential that they can promote disturbs. However, with properly designed transfer 

10 transistors, which remain solidly cut off when unselected, diffusion leakage will 

1 1 maintain floating steering plates at ground (i.e. at substrate potential). In addition, by 

12 making sure that all actively driven steering lines are fully discharged before isolating 

13 them, this will insure that all steering lines are close to ground at all times except 

14 when actually selected/driven. 

15 In addition to disturbs, large voltages on control gates of unselected cells 

16 results in the potential of introducing excessive adjacent cell leakage, impacting proper 

17 multi-state setting and sensing. However, this is not an issue for the above-mentioned 

18 cell embodiment 1 implementation when voltage sensing is used, by virtue of their 

19 poly3 select function being independent of the sensing related steering function. This 

20 allows the select transistor to be throttled down, (i.e. biased to a minimal turn-on level 

21 such as <_5/iAmps), with the state-determining conduction occurring when the control 

22 gate reaches or exceeds the floating gate transistor's turn-on (or margin) voltage. This 

23 select transistor limited current strategy guarantees that, independent of how strongly 

24 conducting the floating gate channel may be, parasitic adjacent cell leakage problems 

25 are completely eliminated. 

26 The same strategy can be applied to the dual floating gate cell embodiment 2, 

27 as illustrated in Figure 20. In this embodiment, the unit memory cell, consisting of 

28 two floating gate elements and taking up the pitch of 4*lambda, has associated with 

29 it a single bit line diffusion (the other bounding bit line diffusion being associated with 

30 the neighboring cell). Therefore, global metal bit lines are naturally reduced to one 

31 line per 4*lambda. This also facilitates laying out the segment transistor matrix (e.g. 

32 non-interlaced, fully confined segmentation via a one-to-one segment transistor to local 
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1 BN+ network) ,0 . Requires only one segment select Air array segment. The 

2 steering transfer matrix is driven by two transfer lines per segment, coupled with 

3 global (metal) steering lines laid out in the pitch of one line per 4*lambda. 

4 When a transfer line is enabled, it turns on the steering selection transistors for 

5 both of the control gates within a cell, for each alternate cell. Each of these two 

6 control gates within each of the selected cells are driven by a unique global steering 

7 line, which, as in the above described cell embodiment 1 case, are driven, in turn, by 

8 the segment select and CGRE circuitry. Also, as in the cell embodiment 1 case, the 

9 issue of floating local steering lines exists, with similar resolution. 

10 With either cell embodiment, in order to fully capitalize on speed, it is 

11 important to make the chunk size as large as possible, maximizing parallelism. 

12 Because of the low cell read and programming currents inherent to both cell 

13 embodiment 1 and la approaches, peak power is not an issue, nor is adjacent cell 

14 leakage, which becomes insignificant. Consequently, the number of floating gates per 

15 chunk which can be simultaneously operated on is limited only by segment decode 

16 restrictions. With the segmentation approach described, this allows every fourth 

17 floating gate to be addressed and operated on, simultaneously, in both cell variants. 

18 In the case of cell embodiment 1, every fourth diffusion is brought to drain 

19 potential, and there are three cells under reversed D/S bias conditions between the 

20 drain and the next driven ground. Once the first set of cells is completed operation 

21 proceeds to the neighboring set. After the fourth such repetition, the full row is 

22 completed. 

23 In the dual floating gate embodiment 2 case, wherein every other cell is 

24 selected, the biasing approach is different. Two adjacent diffusions are driven to drain 

25 potential followed by two adjacent diffusions driven to ground, with that pattern 

26 repeated over and over. In this way global D/S bias is applied in mirrored fashion 

27 to every other of the selected cells, resulting in floating gate of odd selected cells 

28 being the opposite of the even selected cells. Appropriate biases are placed on the 

29 global steering lines to satisfy the operation of the targeted floating gates. Once done, 

30 the bias conditions for both global bit/gnd lines and targeted/untargeted floating gate 

31 steering lines are correspondingly interchanged to act on tHe other floating gate in the 

32 selected cells. Once finished, similar operation is repeated to the alternate set of cells, 
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1 completing full row programming in 4 passes. 

2 To give an idea of the power of this approach, in a physical row of 1500 

3 floating gate elements, encoded in 8-state (three bits per cell), 375 physical bits or 

4 1 125 logical bits are being operated on at one time. Assuming it takes nine pulses to 

5 complete programming, this gives a programming rate of 125 logical bits or. about 16 

6 bytes per programming pulse, plus similar gains in performance achievable for read. 

7 Existing two-state based flash products, by way of comparison, program around 32 

8 bytes per programming pulse, putting the multi-state approach potentially within a 

9 factor of two in write speed. 

10 As described above in this portion of this specification, the cell-by-cell column 

11 oriented steering approach, realizable in the two source side injection cell 

12 embodiments (standard and dual floating gate embodiments), increases the 

13 performance of high level multi-state significantly, improving both its write and read 

14 speed. It achieves this by applying, in parallel, custom steering conditions needed for 

15 the particular state of each cell. This offers substantial reduction in the number of 

16 individual programming steps needed for write, and permits powerful binary search 

17 methodology for read, without having to cany out full sequential search operations. 

18 Improved performance is further bolstered through increased chunk size, made 

19 possible here via the low current source-side injection mechanism, which allows every 

20 fourth floating gate element to be operated on, thereby increasing chunk size. 

21 Although specific examples of array and segmentation architectures have been 

22 described, there are a wide variety of alternate options possible which offer similar 

23 capabilities. 

24 When combining the above concepts with those previously proposed A to D 

25 type sensing approaches, which support the greatest density of multi-state or "logical 

26 scaling" within a cell, this offers a powerful approach to achieving cost reduced, 

27 performance competitive mass storage memories, appropriate to the Gigabit density 

28 generation of products. For example, by achieving effective programming and read 

29 rates of about 50 % that of two-state operation, this bridges the gap between multi-state 

30 and two-state performance substantially, so much so that when the remaining overhead 

31 is included (i.e. those portions not directly related to chunk read or 

32 programming/ verify steps), performance differences from those of two-state can 
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1 become, for al^i«ctical purposes, a non-issue. Coni A-Jg this with the 8 to 16 

2 multi-level (3 to 4 bits) per cell capability, translates to realizing competitively 

3 performing ultra-high density mass storage at a fraction of the cost per Megabyte 

4 (from one half to one third), of equivalent binary encoded memory. 
5 

6 Cell Erase Operation - Erase Distribution Tightening 

7 The independent, bit line oriented steering feature described earlier is, in 

8 certain embodiments, exploited to significantly tighten an initially wide erased cell 

9 population distribution. In a mass storage memory based on the memory cell/array 

10 implementations shown in Figures 19 and 20, all cells in a sector or group of sectors 

11 are erased simultaneously, by applying a sufficiently high positive bias on the poly3 

12 erase electrode relative to the poly2 steering potential. This results in electron 

13 tunneling from the polyl floating gates to the poly 3 erase anode(s), as is described in 

14 the aforementioned copending U.S. patent application serial no. 08/607,951. 

15 An important feature in this embodiment is the capacitive coupling of the 

16 combined channel/drain component. It is designed to have a relatively low coupling 

17 to the floating gate as compared to the steering element, thereby having only weak 

18 impact with respect to the various cell operations, including erase. For example, if 

19 the channel potential during erase is the same as that of poly2 (e.g. both at ground), 

20 the channel will provide only a slight assist to the steering gate in the erasing 

21 operation, resulting in a slightly stronger erase, while if its potential is more positive 

22 than that of the steering gate (e.g. the steering gate bias is lowered negatively, for 

23 example to about -7v, during erase, with the poly3 erase level lowered the same 

24 amount, while the channel potential remains at ground), it will contribute slightly less 

25 to erase. Nevertheless, once the poly3 is raised to the erasing potential, the main 

26 contributor to erasing a cell is the steering element and its potential. 

27 This strong dependence on steering gate potential provides a direct means for 

28 controlling the degree of erase on each cell, individually, in the column oriented 

29 steering embodiment. Operation is as follows. At the start of the erase operation, all 

30 steering lines are biased at their erase enabling potential (e.g. -7v), and a selected row 

31 to be erased (generally this would be one row of a group of rows targeted for erase) 

32 is pulsed to a sufficiently positive potential (e.g. 5v) to start the cell erasing process 
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1 (removing a po the electrons from some or all of ^Aidating gates), but which 

2 is insufficient to erase any of the cells within that row to the required full erase 

3 margin. Once pulsing is completed, the row is biased into a read-at-erase-margins 

4 condition, and each cell is checked to see whether it has erased to that margin or not. 

5 For any cells which have so erased (as will occur after subsequent erase pulses), their 

6 corresponding steering lines will thereafter be biased into a non-erase-enabling or 

7 "lock-out" condition (e.g. at Ov) for all subsequent erase pulsing to that row during 

8 the remainder of that erasing session. This feature can be accomplished by flipping 

9 latches associated with each of the bit/steering line columns. If one or more cells are 

10 still not sufficiently erased, the erase pulse is repeated, preferably at an incrementally 

11 higher poly3 voltage (e.g. 0.5v higher, although increasing time is used in an 

12 alternative embodiment), again followed by the read-at-erase-margins operation. 

13 This pulse/checking loop is repeated as necessary until all cells become 

14 sufficiently erased (or until some other condition such as maximum voltage, pulses, 

15 etc. kicks in, at which time defect management options are invoked), terminating the 

16 erase operation to that row. This procedure is then repeated on all the other rows 

17 targeted for erase, one row at a time, until all rows/sectors so targeted are erased. 

18 In this way all cells in a sector or group of sectors are both sufficiently erased, 

19 and confined to a targeted, tight erase distribution. This capability reduces wear 

20 under repeated write cycling, thereby increasing endurance. It is especially useful in 

21 speeding up multi-state programming operations following erase, since now time does 

22 not have to be expended in bringing heavily overerased cells up to that sufficiently 

23 erased condition. 

24 The drawback of this embodiment is that erasing becomes much more time 

25 consuming, replacing potentially one single erase pulse applied to all rows (or sectors) 

26 simultaneously, with a series of erase pulse/check operations on a row by row basis, 

27 since now only a single row can be erased at a time. This approach is most practical 

28 when the time associated with erase is hidden, eliminating its impact on write 

29 performance. Today there are a number of ways in which mass storage systems 

30 eliminate erase related performance loss, including erase ahead approaches and 

31 dynamic address mapping via RAM translation tables. In such systems, a tight erase 

32 distribution at the start of write can measurably increase write performance, especially 
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1 with respect to multi-state. 

2 The above discussion assumes that each steering line is uniquely associated 

3 with one cell. However, because of layout pitch constraints, especially when 

4 implemented in a segmented steering architecture, several cells may share one global 

5 _ steering signal, examples of which are shown in Figures 19 and 20, where each pair 

6 of cells are associated with one global steering line via steering drive segment transfer 

7 select transistors. Following are two embodiments utilizing such sharing. 

8 One embodiment allows the sharing to take place in each erase operation, 

9 erasing all cells in one row simultaneously, as described above. In this case, 

10 however, erase lock-out on a group of cells (or floating gate transistors in the case of 

11 dual floating gate cells) sharing a common steering line can only be invoked when all 

12 cells in that group have achieved the required erased state margin. This will result 

13 in a fraction of the cells becoming overerased as they wait for the weakest cell in each 

14 group to achieve sufficient erasure. For example, if each sharing group consists of 

15 four cells, in general three cells will become overerased. Figure 21 models the impact 

16 of this sharing approach on a population of 5000 cells, the erase voltages of which 

17 follow a normal distribution with a one-sigma of 0.7v. In the case of two-cell 

18 sharing, 50% of the cells will have minimal overerase, and the remainder will follow 

19 a normal distribution with a one-sigma of about lv. Comparing this to the original 

20 distribution (i.e. without any lockout) shows that with lock-out much fewer cells are 

21 subjected to overerasure, at any level of overerase (i.e. they are further up the sigma 

22 tail), and the worst case overerase voltage is about 1.3v lower than the original 

23 distribution's worst case overerase of about 4.7v. The situation is similar in the case 

24 of four-cell sharing, with slightly increased levels of overerase to those of two-cell 

25 sharing. 

26 A second embodiment takes advantage of the segment level selection capability, 

27 thereby completely avoiding the sharing limitation. Referring specifically to the 

28 previously described embodiments, wherein one global steering line is shared by two 

29 local steering lines (e.g. Figures 19 and 20), the present embodiment exploits the 

30 segment steering line addressing capability to only drive one of the two local steering 

31 lines in each cell pair (or half the row's worth of cells) during each erase operation. 

32 The unaddressed cells' local steering lines are precharged and floated at the non-erase- 
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1 enabling voltage condition (e.g. Ov). Once the addressed half row's worth of cells are 

2 taken through their erase/verify/lockout operations to completion, the steering address 

3 is shifted to the other, previously unaddressed cell group half, which are then erased 

4 to completion, while the first group of cells are maintained in the non-erase-enabling 

5 condition. Although this approach doubles the total erase time compared to using a 

6 single erase pulse for the entire row, it will have no impact to write performance in 

7 erase-hidden implementations, while it does maintain the desirably tight erase 

8 distribution. 

9 In an alternative embodiment, the above controlled overerase methodology is 

10 used to write the multi-state data, with the hot electron programming mechanism 

11 relegated to the data unconditional preset operation. While optimum write bias 

12 conditions and disturb prevention would depend on specific cell and tunneling 

13 characteristics, such a tunneling based write approach is made possible by the 

14 fundamental cell array architecture, consisting of the independently controllable 

15 column steering feature, plus the bit-by-bit lock-out capability of the above disclosed 

16 memory concept relating to Figures 19 and 20. 
17 

18 A variety of alternative embodiments of this invention have been taught, which 

19 provide improved performance and cost efficiency for multi-state memory devices and 

20 systems. The invention now being fully described, it will be apparent to one of 

21 ordinary skill in the art that many changes and modifications can be made thereto 

22 without departing from the spirit or scope of the appended claims. 

23 All publications and patent applications mentioned in this specification are 

24 herein incorporated by reference to the same extent as if each individual publication 

25 or patent application was specifically and individually indicated to be incorporated by 

26 reference. 
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